Prediction of the Secondary Structures of Proteins by Using PREDICT, a Nearest Neighbor Method on Pattern Space
نویسندگان
چکیده
We introduce a novel method for predicting the secondary structure of proteins, PREDICT (PRofile Enumeration DICTionary), in which the nearest-neighbor method is applied to a pattern space. For a given protein sequence, PSI-BLAST is used to generate a profile that defines patterns for amino acid residues and their local sequence environments. By applying the PSI-BLAST to protein sequences with known secondary structures, we construct pattern databases. The secondary structure of a query residue of a protein with unknown structure can be determined by comparing the query pattern with those in the pattern databases and selecting the patterns close to the query pattern. We have tested the PREDICT on the CB513 set (a set of 513 non-homologous proteins) in three different ways. The first test was based on a pattern database derived from 7777 proteins in the Protein Data Bank (PDB), including those homologous to proteins in the CB513 set and gave an average Q3 score of 78.8 % per chain. In the second test, in order to carry out a more stringent benchmark test on the CB513 set, we removed from the 7777 proteins all proteins homologous to the CB513 set, leaving 4330 proteins. Pattern databases were constructed based on these proteins, and the average Q3 score was 74.6 %. In the third test, we selected one query protein among the CB513 set and built pattern databases by using the remaining 512 proteins. This procedure was repeated for each of the 513 proteins, and the average Q3 score was 73.1 %. Finally, we participated in the CASP5 (group ID: 531) where we employed the first-layer database based on the 7777 proteins and the second-layer database based on the CB513 set. The PREDICT gave quite promising results with an average Q3 (Sov) score of 78.1 (77.4) % on 55 CASP5 targets.
منابع مشابه
Drought Monitoring and Prediction using K-Nearest Neighbor Algorithm
Drought is a climate phenomenon which might occur in any climate condition and all regions on the earth. Effective drought management depends on the application of appropriate drought indices. Drought indices are variables which are used to detect and characterize drought conditions. In this study, it was tried to predict drought occurrence, based on the standard precipitation index (SPI), usin...
متن کاملLiquid-liquid equilibrium data prediction using large margin nearest neighbor
Guanidine hydrochloride has been widely used in the initial recovery steps of active protein from the inclusion bodies in aqueous two-phase system (ATPS). The knowledge of the guanidine hydrochloride effects on the liquid-liquid equilibrium (LLE) phase diagram behavior is still inadequate and no comprehensive theory exists for the prediction of the experimental trends. Therefore the effect the ...
متن کاملA Novel Fuzzy Based Method for Heart Rate Variability Prediction
Abstract In this paper, a novel technique based on fuzzy method is presented for chaotic nonlinear time series prediction. Fuzzy approach with the gradient learning algorithm and methods constitutes the main components of this method. This learning process in this method is similar to conventional gradient descent learning process, except that the input patterns and parameters are stored in mem...
متن کاملEvaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests
Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...
متن کاملEvaluation Accuracy of Nearest Neighbor Sampling Method in Zagross Forests
Collection of appropriate qualitative and quantitative data is necessary for proper management and planning. Used the suitable inventory methods is necessary and accuracy of sampling methods dependent the inventory net and number of sample point. Nearest neighbor sampling method is a one of distance methods and calculated by three equations (Byth and Riple, 1980; Cotam and Curtis, 1956 and Cota...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004